The Ontario Data User Group (DUG) has been collecting and analysing the number of COVID-19 cases reported by the Ontario government (https://www.ontario.ca/page/2019-novel-coronavirus). The data has been stored on the DUG Github page and is open to anyone.
knitr::opts_chunk$set(echo = TRUE)
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(plotly))
df <- read_csv("Data/ontario_corona_cases.csv")
## Parsed with column specification:
## cols(
## Case_number = col_double(),
## age_gender = col_character(),
## Public_Health_Unit = col_character(),
## Hospital = col_character(),
## Transmission = col_character(),
## Status = col_character(),
## Date = col_date(format = "")
## )
As of 2020-03-19 10:21:47 there are 214 cases of COVID-19 as reported by the Ontario government. The following chart shows the exponential growth in COVID-19 cases in Ontario.
current_time <- Sys.time()
df %>%
group_by (Date) %>%
count () %>%
ungroup() %>%
mutate (cumsum = cumsum(n)) %>%
# print (n = Inf)
ggplot (aes (x = Date, y = cumsum)) +
geom_line() +
geom_point() +
scale_y_continuous(limits = c(0, length (df$Case_number) + 5)) +
labs (title = "Number of COVID-19 Cases in Ontario",
x = "Date Reported",
y = "Number of Cases") +
geom_text (aes (label = cumsum),
vjust = -1) +
annotate("text", x = as.Date("2020-02-01"), y = length(df$Case_number) - 5,
label = paste0("Current as of\n", current_time )
)
Most of cases are reported in Toronto however the virus is spreading to other regions. Double-click on the region to see that regions
# Create data frame of all dates and Public Health Units
names <- unique(df$Public_Health_Unit)
for (i in names) {
assign (paste0(i),
seq (as.Date("2020-01-25"), Sys.Date(), "days") %>%
data.frame() %>%
rename (Date = 1) %>%
mutate (Public_Health_Unit = i)
)
}
names_list <- lapply(names, get)
dates_all <- do.call(rbind, names_list)
remove(list = names)
# Create master data frame
cases_by_region <- dates_all %>%
left_join (df, by = c ("Date", "Public_Health_Unit")) %>%
group_by(Public_Health_Unit) %>%
mutate(Cases = cumsum (!is.na(Case_number))) %>%
ungroup() %>%
mutate (Public_Health_Unit = parse_factor(Public_Health_Unit))
plot_ly(cases_by_region, x = ~Date, y = ~Cases) %>%
add_lines (color = ~Public_Health_Unit)